Warehousing Web Data

نویسندگان

  • Jérôme Darmont
  • Omar Boussaïd
  • Fadila Bentayeb
چکیده

In a data warehousing process, mastering the data preparation phase allows substantial gains in terms of time and performance when performing multidimensional analysis or using data mining algorithms. Furthermore, a data warehouse can require external data. The web is a prevalent data source in this context. In this paper, we propose a modeling process for integrating diverse and heterogeneous (so-called multiform) data into a unified format. Furthermore, the very schema definition provides first-rate metadata in our data warehousing context. At the conceptual level, a complex object is represented in UML. Our logical model is an XML schema that can be described with a DTD or the XML-Schema language. Eventually, we have designed a Java prototype that transforms our multiform input data into XML documents representing our physical model. Then, the XML documents we obtain are mapped into a relational database we view as an ODS (Operational Data Storage), whose content will have to be re-modeled in a multidimensional way to allow its storage in a star schemabased warehouse and, later, its analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chapter.i, " Combining Data Warehousing and Data Mining Techniques for Web Log Analysis "

In enterprises, a large volume of data has been collected and stored in data warehouses. Advances in data gathering, storage, and distribution have created a need for integrating data warehousing and data mining techniques. Mining data warehouses raises unique issues and requires special attention. Data warehousing and data mining are interrelated , and require holistic techniques from the two ...

متن کامل

Warehousing complex data from the web

The data warehousing and OLAP technologies are now moving onto handling complex data that mostly originate from the Web. However, intagrating such data into a decision-support process requires their representation under a form processable by OLAP and/or data mining techniques. We present in this paper a complex data warehousing methodology that exploits XML as a pivot language. Our approach inc...

متن کامل

0 Dwhuldol ] Lqj : He ' Dwd

Business decisions must rely not only on company-internal data but also on external data from competitors or relevant events. This information can be obtained from the WWW but must be integrated with the data in a company's data warehouse. In this paper we discuss a system architecture for warehousing Web content for OLAP and DSS. A self-describing object model is used to make the implicit mode...

متن کامل

A Data Warehousing and Data Mining Framework for Web Usage Management∗

A new challenge in Web usage analysis is how to manage and discover informative patterns from various types of Web data stored in structured or unstructured databases for system monitoring and decision making. In this paper, a novel integrated data warehousing and data mining framework for Website management and patterns discovery is introduced to analyze Web user behavior. The merit of the fra...

متن کامل

Ontological Engineering in Data Warehousing

In our previous work, we proposed the ontology-based integration of data warehousing to make existing data warehouse system more user-friendly, adaptive and automatic. This paper further outlines a high-level picture of the ontological engineering in data warehousing. Its basic theory includes building ontology profiles for warehousing in terms of domain ontology and problemsolving ontology, an...

متن کامل

Toward Active XML Data Warehousing

Warehousing data is not a trivial task, particularly when dealing with huge amounts of distributed and heterogeneous data. Moreover, traditional decision support systems do not feature intelligent capabilities for integrating such complex data. Therefore, we propose an approach for intelligent decision support based on active XML warehousing. We exploit XML as a pivot language in order to unify...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0705.1456  شماره 

صفحات  -

تاریخ انتشار 2002